Cache Use and Cache Coherency

Cache Use and Cache Coherency

The primary and secondary caches shown in Figure 1-1 are essential to CPU performance. There is an order of magnitude difference in the speed of access between cache memory and main memory. Execution speed remains high only as long as a very high proportion of memory accesses are satisfied from the primary or secondary cache.

The use of caches means that there are often multiple copies of data: a copy in main memory, a copy in the secondary cache (when one is used) and a copy in the primary cache. Moreover, a multiprocessor system has multiple CPU modules like the one shown, and there can be copies of the same data in the cache of each CPU.

The problem of cache coherency is to ensure that all cache copies of data are true reflections of the data in main memory. Different Silicon Graphics systems use different hardware designs to achieve cache coherency.

In most cases, cache coherence is achieved by the hardware, without any effect on software. In a few cases, specialized software, such as a kernel-level device driver, must take specific steps to maintain cache coherency.

Cache Coherency in Multiprocessors

Multiprocessor systems have more complex cache coherency protection because it is possible to have data in multiple caches. In a multiprocessor system, the hardware ensures that cache coherency is maintained under all conditions, including DMA input and output, without action by the software. However, in some systems the cache coherency hardware works correctly only when a DMA buffer is aligned on a cache-line-sized boundary. You ensure this by using the KM_CACHEALIGN flag when allocating buffer space with kmem_alloc() (see the kmem_alloc(D3) reference page).

Note: In one specific hardware configuration of Challenge and Onyx systems, a hardware problem can cause cache coherency errors that a device driver must deal with; see Appendix B, "Challenge DMA with Multiple IO4 Boards."

Cache Coherency in Uniprocessors

In some uniprocessor systems, it is possible for the CPU cache to have newer information than appears in memory. This is a problem only when a device is going to perform DMA. In these systems, a device driver calls a kernel function to ensure that all cached data has been written to memory prior to DMA output (see the dki_cache_wb(D3) reference page). The device driver calls a kernel function to ensure that the CPU receives the latest data following a DMA input (see the dki_cache_inval(D3) reference page). In a multiprocessor these functions do nothing, but it is always safe to call them.